Simple Words in Equality Sets

نویسندگان

  • Marjo Lipponen
  • Arto Salomaa
چکیده

It is well known that equality sets between two morphisms possess a remarkable generative capacity: an arbitrary recursively enumerable set is obtained from an equality set by certain simple operations. Interconnections between simplicity of computations and structural primitivity of words in equality sets have also been observed. The paper discusses recent work in this area, pointing out certain open problems and emphasizing new directions for research. TUCS Research Group Mathematical Structures of Computer Science 1 Post Correspondence Problem and expressibility of equality sets We view an instance of the Post Correspondence Problem, PCP, as a pair (g; h) of morphisms g; h : ! . A solution for this instance (g; h) is any word in the equality set (or equality language) E(g; h) = fw 2 + j g(w) = h(w)g: Thus, the equality set being empty means that there are no solutions; the empty word is not considered to be a solution. For example, consider the range and target alphabets = f1; 2; 3g and = fa; bg and de ne an instance (g; h) of the PCP by the table 1 2 3 g a2 b2 ab2 h a2b ba b Every solution must begin with the letter 1, after which 2, 1 and 3 must follow, in this order. This is due to the fact that, for any word w0, whenever w0 begins a solution w, w = w0x, then one of the words g(w0) and h(w0) must also be a pre x of the other. For our instance (g; h), the situation can be described by the table w0 1 12 121 1213 g(w0) a2 a2b2 a2b2a2 a2b2a3b2 h(w0) a2b a2b2a a2b2a3b a2b2a3b2 The table shows that w0 = 1213 is actually a solution. Could we build up the word w0 in a di erent way? Having in mind the above-mentioned pre x condition concerning g(w0) and h(w0), we see that w0 must begin with 12. But after that we have also the possibility 3 because h(123) = a2b2ab is a pre x of g(123) = a2b2ab2. After this choice, our only further possibility is 3, leading to the words h(1233) = a2b2ab2 and g(1233) = a2b2ab2ab2. But now this road is blocked; there is no way we can continue. We should take care of the excess part ab2 using the h-words { but there is no word x such that ab2 is a pre x of h(x). This analysis shows that the choices indicated in the above table are the only ones leading to a solution. Clearly, an arbitrary catenation of solutions is again a solution. Thus, we have characterized all solutions for this particular instance (g; h) of the PCP: E(g; h) = (1213)+: 1 It is obvious that the set of solutions is always a star language: whenever L = E(g; h) for some (g; h), then L = L+. If a solution w is a product of some other solutions, say w = w1w2w3, it is natural to regard the factors wi as \simpler" or \more primitive" solutions. Indeed, various types of \prime solutions" can be de ned. The purpose of the present article is to survey recent work carried out along these lines, as well as to point out some further directions and problems. Apart from sheding light on equality sets and the PCP in general, such primality considerations have intrinsic connections with certain aspects of combinatorics on words, as will become apparent below. Let us still go back to the instance (g; h) discussed above. It is seen that, on the road to success, the morphism h \runs faster" than g: for any pre x w0 of a solution, g(w0) is a pre x of h(w0). When the morphism g \catches up" the morphism h, it means that a solution is found. Exactly the same idea is used in establishing general representability results in terms of equality sets. One builds up a word describing a computation or a derivation step by step, using two morphisms. One of them runs faster. If the other catches up, which happens for words in the equality set, it means that such a computation or derivation is actually possible. To get the representability result, the nal result of the computation or derivation has still to be \squeezed out" from the long word obtained. Let us becomemore speci c and prove that an arbitrarily given recursively enumerable language L T can be represented in the form L = hT (E(g; h) \R); (1) for some morphisms g; h, regular language R and weak identity hT . (That is, hT is a morphism mapping the letters of T into themselves and erasing all other letters.) The proof use the ideas indicated above and will refer to derivations and grammars. A similar construction referring to computations and Turing machines is given in Lemma 3.1 of [MSSY]. Thus, let L T be a language generated by a (type 0) grammar G = (N;T; S; P ), where the production set P consists of the productions pi : i ! i; 1 i n: (Observe that we associate a label pi to each production.) We consider also \primed versions" of letters by de ning T 0 = fa0 j a 2 Tg. The domain and target alphabets of our two morphisms g and h will be = [ P [ T 0 [ fB;Fg; = N [ T [ f)g; 2 where, intuitively, B and F are beginning and nal letters and ) marks the yield relation between two derivation steps. The two morphisms themselves are de ned by the table B ) pi 2 P (pi : i ! i) A 2 N a0 2 T 0 a 2 T F g S ) ) i A a h ) i A a a ) Further, we de ne the regular language R by R = B(V PV ))+T F; where V = N [ T 0: We omit the detailed argument showing that (1) is valid { the interested reader is referred to [Sa1]. The construction follows ideas presented above. The operation \R restricts the possibilities to words of E(g; h) resembling derivations. The word being in E(g; h) guarantees that the derivation is properly performed. The morphism g runs faster than h, and hT erases everything except the nal result of the derivation. Speci cally, whenever S ) 1 ) 2 ) : : :) k is a derivation according to G, then there is a word w 2 B(V PV ))+ such that g(w) = S ) 1 ) 2 ) : : :) k ); h(w) = S ) 1 ) 2 ) : : :) k 1 ) : Moreover, if k = x 2 T , then wxF 2 E(g; h) \R and hT (wxF ) = x: The operation \R is needed to rule out unwanted words. Repetitions of a successful derivation of x give rise to a word in the equality set E(g; h) and consequently, without the operation \R, we would get all powers xi together with x. This can be avoided, as done in [C], if only words from E(g; h) are taken into account such that no proper pre x of the word is in E(g; h). Such solutions of the instance (g; h) are called F-prime in the sequel. Thus, the representability result (1) becomes simpler if the set of all solutions E(g; h) is replaced by the set of F-prime solutions. This is one of the early reasons for investigating primitive solutions of the PCP. Further interconnections between the primitivity of solutions and complexity of computations are indicated in [MS2] and [MSSY]. 3 In the representation result (1), the \squeezing" mechanisms \R and hT can be expressed as one gsm-mapping f : L = f(E(g; h)). This follows because both \R and hT are gsm-mappings, and gsm-mappings are closed under composition. (We note in passing that the abbreviation \gsm" for \generalized sequential machine" was well known already in the early 70's, whereas nowadays the principal meaning of \gsm" is quite di erent. Can the reader guess what the principal meaning of the abbreviation \PCP" is? Several dictionaries seem to agree in this respect { none of them knows \Post Correspondence Problem"!) The representation result (1) can be modi ed to a form that is particularly interesting in view of the recent studies concerning DNA-computing. Consider the binary alphabet 2 = f0; 1g and its barred version 2 = f0; 1g. The barred version w of a word w 2 2 is obtained by barring every letter of w. The twin-shu e language Lts over the alphabet 2 [ 2 is de ned by Lts = fw 2 ( 2 [ 2) j h 2(w) = h 2(w)g: Thus, Lts consists of words obtained by taking an arbitrary word x 2 2, its barred version x 2 2 and shu ing the two in a fashion otherwise arbitrary, but preserving the order of letters in both x and x. (The reader is referred to [MS3] for the basics of the shu e operation.) The twin-shu e language can be de ned also as an equality set. Let h0 2 : ( 2 [ 2) ! 2 be the morphism de ned by h0 2(a) = ( for a 2 2; b for a = b 2 2: Then Lts = E(h 2; h0 2) [ f g. Instead of starting with the binary alphabet 2, we can de ne in a similar way twin-shu e languages for other alphabets as well. The representation result (1) assumes now the following form. Let again L T be an arbitrary recursively enumerable language. Then there is a gsm-mapping fL such that L = fL(Lts): (2) Thus, the role of Lts is universal. Everything speci c about the given language L is implemented in the gsm-mapping fL. The representation (2) is obtained from (1) roughly as follows. ([Sa1, pp. 114{117] contains further details.) The restrictive capacity of the intersection E(g; h) \ R can be simulated by replacing the letters b of R with the words g(b)h(b), and then intersecting 4 the resulting regular language R(g; h) with a twin-shu e language L0ts. Originally, this language L0ts may involve more than two letters but it can be obtained from the binary twin-shu e language Lts by an inverse gsm-mapping. Thus, we are applying to Lts the operations of (i) an inverse gsm-mapping, (ii) an intersection with the regular language R(g; h) and (iii) a weak identity, whose composition amounts to one gsm-mapping fL. A (nonempty) word w is in the equality set E(g; h) exactly in case the word g(w)h(w) is in the (appropriate) twin-shu e language. This is the key idea. This key idea also clearly points out the interconnection with DNAcomputing. DNA consists of four bases, customarily denoted A (adenine), C (cytosine), G (guanine) and T (thymine). The chemical structure of these molecules causes a strong a nity between A and T, and likewise between C and G. This a nity allows a string of bases, say TAGCAT, to pair up with its complementary string ATCGTA, customarily referred to as itsWatson-Crick complement, the result being a double-stranded molecule, TAGCAT ATCGTA. (We have somewhat simpli ed the matters here by not speaking of the polarity of the strings, which is unessential for us. We refer to [Lila] for further details.) Let us denote A, T, C, G by 0, 0, 1, 1, respectively. Let us, further, rewrite double strands as one-dimensional strings by taking characters from both strands by turns. Thus, our example TAGCAT ATCGTA becomes rst 001100 001100 and then 000011110000, a word in the twin-shu e language! In this way Watson-Crick complementarity yields words in the twin-shu e language. (All words in Lts are not obtained in this fashion.) So far many models have been proposed for DNA-computing. In all of them Watson-Crick complementarity plays a central role. The relation (2) guarantees the universality of computations in all models, provided the input-output format is su cient to take care of the role of the gsm-mapping fL in (2). Further discussion about this matter lies beyond the scope of this article. We conclude this introductory section with a further example. Consider the instance (g1; h1) of the PCP de ned by the table 1 2 3 g1 b2 ab c h1 b ba bc This example is a modi cation of the one in Post's seminal paper [Post]. Clearly, E(g1; h1) = (12 3)+. Moreover, the set of F-prime solutions (in the sense indicated above) consists of all words in 12 3. It is well known that PCP is one of the \cornerstones of undecidability", [RS], customarily used when a language-theoretic problem is shown to be undecidable. The simplest way of showing that PCP itself is undecidable is to use Post normal systems, see [Post, Sa2, RS]. 5 2 Prime solutions and primality types We have illustrated the expressive capabilies of equality sets from many different angles. We wanted in this way to prepare background for primality considerations, the actual theme of this article. We already pointed out that some solutions of (an instance of) the PCP, that is, some words in the equality set, are \simpler" or \more primitive" than others. We will investigate various classes of such prime solutions, surveying recent work done in this area. It is clear that the structural simplicity of a word in an equality set is re ected also in representation results based on equality sets, a typical example being the result by Culik, [C], already mentioned above. However, many details concerning the signi cance of various primality aspects to the representation of computations are not yet properly understood and remain to be investigated further. In this respect it is interesting to note the fact established in [MSSY] that deterministic computations correspond to morphisms of bounded delay. Let us still go back to the instance (g1; h1) introduced at the end of Section 1. The words in the regular language 12 3 are solutions. Moreover, no proper nal subword can be removed from such a word and still have a solution. However, words of the form 12i3, i 1, are not \prime solutions" in view of the fact that the subword 2i, or some part thereof, can be removed and still have a solution. This leads to the notion of an S-prime solution: no proper subword can be removed and still have a solution. In this case 13 is the only S-prime solution. A still more restrictive notion is that of a P-prime solution: no scattered subword can be removed and still have a solution. Let us do these de nitions formally. Consider morphisms g; h : ! . For a word w over , we consider three sets of words obtained from w by removing a nal subword, a subword or a scattered subword, respectively. More speci cally, de ne fin (w) = fv j w = vx; for some x 2 g; sub (w) = fv1v2 j w = v1xv2; for some v1; v2; x 2 g; scatsub (w) = fv1 : : : vk j w = x1v1 : : : xkvkxk+1; for some xi; vi 2 g: We want to emphasize that sub (w) is obtained by catenating the remaining parts when a subword of w is removed. The set scatsub (w) can be viewed in an analogous way but also as the set of scattered subwords of w. We can now determine three subsets of the equality set, F (g; h) = fw 2 E(g; h) j fin (w) \ E(g; h) = fwgg; S(g; h) = fw 2 E(g; h) j sub (w) \ E(g; h) = fwgg; P (g; h) = fw 2 E(g; h) j scatsub (w) \ E(g; h) = fwgg: 6 Words in these three sets are called F-prime, S-prime and P-prime solutions for the instance (g; h). It follows immediately that every P-prime is an S-prime, and every Sprime is an F-prime, i.e., P (g; h) S(g; h) F (g; h) E(g; h): One or both of the rst inclusions may be strict whereas the third inclusion is always strict, provided E(g; h) is nonempty. If E(g; h) is nonempty then so must be the three other sets. In addition, each of the four sets is recursive. The triple (p; s; f), where p, s and f are the cardinalities of the sets P (g; h), S(g; h) and F (g; h), respectively, is de ned to be the primality type of the instance (g; h). Thus p, s and f are nonnegative integers or 1. For the speci c instance (g; h) considered at the beginning of Section 1, we have E(g; h) = (1213)+ and P (g; h) = S(g; h) = F (g; h) = f1213g: Thus, the primality type of this instance is (1; 1; 1). The primality type of the instance (g1; h1) considered at the end of Section 1 is (1; 1;1). A complete characterization of primality types was given in [SSY]. It is a straightforward consequence of the de nition that if one of the components p; s; f equals zero, all three components must equal zero. Similarly, we have always p s f . According to a result originally due to Higman (see [Hi, MS3]), every language L, with the property that no word in L is a scattered subword of another word in L, is nite. This implies that the component p is nite in every primality type. A further restriction on primality types (p; s; f) is due to the following result established in [SSY]. Assume that the word xyz is in F (g; h), where x; y; z are nonempty, and that xz is in E(g; h). Then xyiz is in F (g; h), for all i 0. It is a consequence of this result that, whenever s < f , then f = 1. There are no further restrictions on primality types (p; s; f). Indeed, given any triple (p; s; f) such that either p = s = f = 0, or else (i) 1 p s f , (ii) p is nite and (iii) if s < f then f =1, an instance (g; h) having (p; s; f) as its primality type can be e ectively constructed. Details are given in [SSY]. We now take a step further and consider prime words over the target alphabet in a similar way. We rst introduce the \co-sets" corresponding to the sets E(g; h), F (g; h), S(g; h) and P (g; h): co -E(g; h) = f 2 j g(w) = h(w) = ; for some w 2 +g; 7 co -F (g; h) = f 2 co -E(g; h) j fin ( ) \ co -E(g; h) = f gg; co -S(g; h) = f 2 co -E(g; h) j sub ( ) \ co -E(g; h) = f gg; co -P (g; h) = f 2 co -E(g; h) j scatsub ( ) \ co -E(g; h) = f gg: Words in the last three sets are called Fc-prime, Sc-prime and Pc-prime with respect to the instance (g; h). Thus, a word over being an Fcprime means that is the morphic image (under g and h) of a solution and, moreover, no proper pre x of has this property. Whereas co -E(g; h) = g(E(g; h)) = h(E(g; h)); we have only co -F (g; h) g(F (g; h)) = h(F (g; h)) and, moreover, the inclusion is strict for some instances (g; h). The situation is analogous as regards the sets co -S(g; h) and co -P (g; h). Let pc, sc and fc be the cardinalities of the sets co -P (g; h), co -S(g; h) and co -F (g; h), respectively. The six-tuple (pc; p; sc; s; fc; f) is de ned to be the mixed primality type of the instance (g; h). Again the inequalities pc sc fc; pc p; sc s; fc f are easily established. A further restriction on mixed primality types is the following unexpected fact, established in [MS2]. If s = 1, then fc = 1 or fc =1. These are the only restrictions on mixed primality types. To summarize, let (pc; p; sc; s; fc; f) be a six-tuple of positive integers or 1 satisfying each of the following conditions (i)-(vi): (i) p s f , (ii) pc sc fc, (iii) pc p; sc s; fc f , (iv) p is nite, (v) if s < f , then f =1, (vi) if s = 1, then fc = 1 or fc =1.8 Then an instance (g; h) with this six-tuple as its mixed primality type can be e ectively constructed. All details of the construction are given in [MS2]. We have, thus, an exhaustive characterization of both primality types and mixed primality types. We will view next the problem of primality from the opposite angle, that is, the procedure is reversed. Given a word w, we search for an instance (g; h) of the PCP such that w is an F-prime (resp. S-prime, P-prime) solution of (g; h). 3 P-, Sand F-words The study of prime words, initiated in [MS1], has turned out to be especially interesting from point of view of combinatorics on words. In this section we take a closer look at the properties of P-, Sand F-words. By applying Makanin's result, [Ma], of the solvability of a nite system of equations, we can prove that each of the properties of being a P-word, an S-word or an F-word is decidable. Nevertheless, because of the di culty of Makanin's general algorithm we gain very little information of the nature of these prime words. Therefore we have tried to seek other methods to characterize them. The following two notions are important in the sequel. We say that u is Parikh shorter than v if (u) (v) componentwise. The basic Parikh vector 0 of w is obtained from the Parikh vector (w) = (x1; : : : ; xn) by dividing it with gcd(x1; : : : ; xn). Thus the components of the basic Parikh vector are always relatively prime. (Notice that this is not the same as being pairwise relatively prime, as seen from the vector (2; 6; 9).) The main result in [Li3] says that for each word w we can e ectively nd an instance (g; h) such that w is a solution and any other word w0 which is Parikh shorter than w is also a solution if and only if w and w0 have the same basic Parikh vectors, 0(w) = 0(w0). Let us consider an example. For a word w = 112321323 we have (w) = (3; 3; 3) and 0(w) = (1; 1; 1). Thus by the previous result we can nd an instance (g; h) such that w and all the words whose Parikh vectors are (2; 2; 2) or (1; 1; 1) (such as 132312 and 312) are solutions but, for instance, 112; 1123 or any other proper pre x of w is not a solution for this instance (g; h). But this means that w is an F-prime solution and hence an F-word. An instance showing this is the following. 1 2 3 g a21 a a h a a5 a17 9 Let us call a word ratioprimitive if none of its pre xes has the same basic Parikh vector as the whole word. We can easily show that F-words are exactly ratioprimitive words. Hence we have obtained an e ective algorithm for Fwords: it is easy to check for a given word whether it is ratioprimitive or not. For instance, the words 112321323 and night are F-words while 112323123 and noon are not F-words. For Sand P-words the results obtained so far are not as strong. Using the result above, we can prove that any subratioprimitive word is also an Sword where a word is de ned subratioprimitive when none of its subwords has the same basic Parikh vector as the whole word. Moreover, any word whose Parikh vector is the same as its basic Parikh vector, i.e., its components are relatively prime, is a P-word. Another subclass of P-words, found in [MS1], are the words of the form ai1 1 ai2 2 : : : ain n where = fa1; : : : ; ang and n 2. Thus, for instance, 12321 and 113322 are P-words. We can prove that these properties do not characterize P-words nor S-words completely. For instance, 311132223 is an S-word but not subratioprimitive ( 0(132) = 0(311132223)) and 122313 is a P-word though (122313) = 2(1; 1; 1). These claims are established by the following instances 1 2 3 1 2 3 g1 a6b a a2 g2 a a8b a h1 a2 ba6 a h2 a2 a2 a3ba3 Nevertheless, we conjecture that in a binary alphabet fa; bg a nonempty word w is a P-word and an S-word if and only if (w) = 0(w) or w 2 a+b+ [ b+a+. One way to prove this conjecture is to show that words that are not Ptype, that is (w) 6= 0(w) and w 62 a+b+[b+a+, must be periodicity forcing. In other words, every instance (g; h) for which they are solutions is trivial (for some a 2 , g(a) = h(a)) or periodic (for some u 2 +, g(a) = uia, h(a) = uja for all a 2 ). Intuitively, as the name says, these words force their corresponding instances to be periodic (or trivial). We have a lot of examples supporting this conjecture but no formal proof. For instance, the words w where (w) = k(1; 1; 1), 2 k 4, and w 62 a+b+ [ b+a+ are all periodicity forcing and hence not Pand S-words. Also in larger alphabets (with at least three letters) we can use the property of being periodicity forcing to characterize words that are not Por S-words. It is easy to show that if a periodicity forcing word is not subratioprimivite (resp. P-type) then it cannot be an S-word (resp. a P-word). Thus the study of periodicity forcing words can shed new light on the characterization of Pand S-words. But not completely since there are non-S-words, such 10 as 1212123123, which are not subratioprimitive but not periodicity forcing either. Our attempt to characterize prime words beni ts also from the fact that the properties of not being a P-word or an S-word are preserved by nonerasing morphisms. In this way we can enlarge the sets of non-P-words and non-Swords found so far. For instance, since 112122 is not a P-word also 112312323 and 121231233 are not P-words. Next we study the hierarchy between P-, Sand F-words. It follows from the de nition that every P-word is also an S-word and every S-word is also an F-word. So we only need to settle whether these inclusions are proper or not.The relation between Sand F-words turns out to be strict in every alphabet. We can prove that each word w = (123 : : : (k 1))2(123 : : : k)2 over a k-letter alphabet, k 2, is an F-word since it is ratioprimitive but not an S-word because every time w is a solution for some instance (g; h) then also w0 = 123 : : : (k 1)123 : : : k is a solution and w0 2 sub (w). For Pand S-words the result is slightly weaker. We can prove that if the alphabet has at least three letters then the relation is proper. In a three-letter alphabet the word w = a2b3c3a4b3c3a3 is an S-word, being subratioprimitive, but not a P-word since it is periodicity forcing and not P-type (we have (w) = 3(3; 2; 2)). In bigger alphabets we, respectively, consider the morphic images of the word w under the morphism (a) = a; (b) = b; (c) = c1 : : : cn: As mentioned above (w) cannot be a P-word since w is not a P-word but it is still subratioprimitive and, hence, an S-word. The only case that still remains open is the binary alphabet for Pand S-words. We conjecture that in this case Pand S-words are actually equal and, what is more, the sets of P-prime and S-prime solutions are the same for all instances (g; h). In fact, we can prove that if at least one of the morphisms g and h is periodic then P (g; h) = S(g; h). This follows from the characterization of the equality sets in these types of instances. On the other hand, if g and h are both injective then the equality set is either of the form fu; vg+, fwg+ or ( )+ (see [EKR] for details) and, hence, in the last two cases the sets of S-prime and P-prime solutions are again equal. The case fu; vg+ seems also be clear, at least the only known examples of equality sets generated by two words are f1i2; 21ig+ for some xed i 1 and here P (g; h) = f1i2; 21ig = S(g; h). An interesting observation is that the conjecture concerning periodicity forcing words (every non-P-type is periodicity forcing) would prove this conjecture, too. 11 The end of this section is devoted to prime languages of the Post Correspondence Problem. There are two types of de nitions, based on equality or inclusion. We say that L is an F-language in equation sense if, for some instance (g; h), L = F (g; h) and in inclusion sense if L F (g; h). The following two gures show how prime languages lie in the Chomsky hierarchy. In the rst gure we use the equation method. Notice that in this case P-, Sand F-languages do not form an increasing hierarchy. '& $ % '& $% '& $% Reg C-free C-sensitive P'& $ S 'F The second picture settles the relation for the inclusion method. ' & $ % '& $% '& $% Reg C-free C-sensitive P '& $% S F Examples of all areas in these pictures can be given. [Li4] studies also other properties of prime languages. Using Makanin's result, [Ma], as for prime words, we can prove that for a nite language L it is decidable whether L is a P-language, an S-language or an F-language in inclusion sense. In this type of prime languages we can actually characterize F-languages completely in a binary alphabet since L is an F-language if and only if every element of L is ratioprimitive and everyone has the same basic Parikh vector. For Sand P-languages we can only say that if every element of L has the same Parikh vector whose components are relatively prime or L faib; baig for some i 1, then L is a P-language and an S-language. The main characterization result for Pand S-languages in equation sense is that if L has at least three elements then L is a P-language and an Slanguage in a binary alphabet if and only if L is of the form c(faibjg)(i; j) = 1). 12 The set c(faibjg)(i; j) = 1) consists of all the words whose Parikh vector is (i; j) and i and j are relatively prime. For instance, c(fa3b1g) = faaab; aaba; abaa; baaag. For F-languages in equation sense we can only say that if L is nite and it has at least three elements then L cannot be an F-language. Notice that also in connection of prime languages periodicity forcing words play an important role. For instance, the hierarchy of P-, Sand F-words in equation sense was proved not to be increasing, using periodicity forcing words. A further interesting topic of prime languages is the niteness of Planguages. This follows from the result of the second section where P (g; h) was proved to be nite for every instance (g; h). Thus the question arises whether the maximal number of P-prime solutions can be counted in terms of the cardinality of the domain alphabet and the size of the instance (g; h), that is maxfjg(ai)j; jh(ai)j : ai 2 g. In a binary alphabet this has been completely settled in [Li4], the maximal number being [(l 2) + (l 1)]! (l 2)!(l 1)! where the size of the instance is at most l. Thus if l = 5, the maximal number is 35. In larger alphabets the question is settled for periodic instances and we conjecture that this is also the maximum in every instance. The algorithm is, however, too long to be presented here. 4 Variations So far we have studied only three types of prime solutions based on removing a su x, a subword or a scattered subword. In [Li4] also other ways to de ne the primality were studied. We start with strongly prime PCP-words, based on [LP]. This means that words are primitive for every (not only some) instance (g; h) for which they are solutions. This de nition turns out to be too restrictive (even after leaving out instances (g; h) where g = h) since in alphabets with at least three letters no word is strongly prime: for every word we are able to nd an instance for which the word is not primitive. In a binary alphabet, however, we have managed to completely characterize these new types of words. We can prove that any word w is strongly P-prime and S-prime if and only if its Parikh vector is the same as its basic Parikh vector, (w) = 0(w), whereas 13 any strongly F-prime is also F-prime and vice versa. Thus if strongly prime words are denoted with upper primes we obtain the following hierarchy P 0 2 = S 0 2 P2 S2 F2 = F 0 2 where all the inclusions are proper except possibly P2 = S2 as already mentioned in the previous section. For many purposes it is useful to regard primitive solutions as obtained from the equality set E(g; h) by certain operations, such as partial orders. We now study solutions that are minimal with respect to the given partial order. For instance, P (g; h) is obtained from E(g; h) by s where u s w if u is a scattered subword of w and F (g; h) is obtained by p where u p w if u is a pre x of w. In the same way we obtain new primitive solutions by comparing their lengths, Parikh vectors, lexicographic order and so on. The primitive solutions obtained by the rst-mentioned two partial orders (length, Parikh vector) are especially interesting since their cardinalities and in every instance (g; h) together with p from P (g; h) form a new primality type ( ; ; p) called a nite primality type which can be fully characterized. We can prove that the triple ( ; ; p) is a nite primality type if and only if either 0 = = = p or 1 p <1. The end of this section is devoted to a third type of primality. We consider ways of adding an arbitrary language L to the de nitions of P-, Sand Fprime solutions. More speci cally, a solution is called FL-prime if it cannot be divided in such a way that the pre x is a solution and the su x belongs to L. So the only di erence with the usual de nition of F-prime solutions is that the part to be removed should now be an element of L. Consequently, F (g; h) = F (g; h). With this new de nition it is especially interesting to study the e ect of the di erent set-theoretic operations of languages on the corresponding L-prime solutions. For instance, it turns out that FL1[L2(g; h) = FL1(g; h) \ FL2(g; h) for all languages L1 and L2 but we can only say that FL1(g; h) [ FL2(g; h) FL1\L2(g; h). [Li4] also studies L-prime solutions in connection of some special language such as at words or tense words. We say that a word is a katword if and only if all its subwords of length r, r k + 1, contain each letter at most once. For instance, w = abcbcad is a 1atword but not a 2atword. Now the at prime solutions form an increasing hierarchy in every instance (g; h) where card ( ) = n: 14 E(g; h)-Fn flat(g; h)-Sn flat(g; h)6-Pn flat(g; h)66......6...66F2 flat(g; h)-S2 flat(g; h)6-P2 flat(g; h)66F1 flat(g; h)-S1 flat(g; h)6-P1 flat(g; h)66F (g; h)-S(g; h)-P (g; h)The hierarchy can also collapse in some instances (g; h). For instance, ifjFk flat(g; h)j <1 then Fk flat(g; h) = Sk flat(g; h) = F (g; h) = S(g; h) andif jF (g; h)j < 1 then either Sk flat(g; h) = F (g; h) = S(g; h) orjSk flat(g; h)j =1.The hierarchy of the corresponding words, at prime words, can also becharacterized. It turns out that it is the same as for solutions except thatevery inclusion (an arrow) is proper in the previous gure and that no otherrelations exist; that is, the sets Pk flat and S(k 1) flat as well as Sk flat andF(k 1) flat are incomparable.Tense words are in some sense opposite to atwords. Now each subwordof length k should contain all letters of the alphabet. For instance, abca is3-tense and ababa is 2-tense. The tense prime words form again an increasinghierarchy. This time, however, we study separately F-, Sand P-hierarchiesF (g; h) : : : F 24 tense(g; h) F 23 tense(g; h) F 22 tense(g; h) E(g; h);S(g; h) : : : S24 tense(g; h) S23 tense(g; h) S22 tense(g; h) E(g; h);P (g; h) : : : P 24 tense(g; h) P 23 tense(g; h) P 22 tense(g; h) E(g; h);where the superscipt 2 indicates the binary alphabet.The main results, concerning the hierarchies, in [Li4] settle the nite-ness of the three hierarchies. Sand P-hierarchies turn out to be nite inevery instance (g; h). This means that there always exists a k for whichS2k tense(g; h) = S2(k+1) tense(g; h) = S2(k+2) tense(g; h) and so on.For F-hierarchies this is also the case if g or h is injective. If both areperiodic then the corresponding F-hierarchy is always in nite: for every k,F 2k tense(g; h) F 2(k+1) tense(g; h) is not empty.15 5 Open problems. ConclusionIn addition to the conjectures we have presented throughout the text wewould like to mention some other open problems. It seems that the binaryalphabet makes an exception to many rules. Also the results are usually easierto achieve. The instances we can construct there are never very complicatedwhich indicates that the binary case can possibly be completely settled.But in larger alphabets lots of questions remain open. For instance, thenature of Pand S-words is beyond our present knowledge. Prime languages,on the other hand, lack a characterization in all alphabets.We have emphasized the expressibility of equality sets: arbitrary compu-tations can be expressed in terms of elements of equality sets. In view of theinitial observations in [MSSY], it is an interesting research area to establishother explicit interconnections between primality of solutions and simplicityof computations.References[C]K. Culik II: A purely homomorphic characterization of recursivelyenumerable sets, J. Assoc. Comput. Mach. 26 (1979) 345{350.[EKR] A. Ehrenfeucht, J. Karhumaki, G. Rozenberg: On binary equalitysets and a solution to the test set conjecture in the binary case, J.Algebra 85 (1983) 76{85.[HLM] T. Harju, M. Lipponen, A. Mateescu: Flatwords and Post Corre-spondence Problem, Theoret. Comput. Sci., to appear.[Hi] G. Higman: Ordering by divisibility in abstract algebras, Proc. Lon-don Math. Soc. 2 (1952) 326{336.[Lila] L. Kari: DNA Computers, Tomorrow's Reality, EATCS Bull. 59(1996) 256{266.[Li1] M. Lipponen: Primitive words and languages associated to PCP,EATCS Bull. 53 (1994) 217{226.[Li2] M. Lipponen: Post Correspondence Problem: words possible asprimitive solutions, Proc. 22nd ICALP, Springer LNCS 944 (1995)63{74.16 [Li3] M. Lipponen: On F-prime solutions of the Post CorrespondenceProblem. In J. Dassow, G. Rozenberg and A. Salomaa (ed.) Develop-ments in Language Theory II (Magdeburg, 1995), World Scienti c,Singapore (1996) 139{147.[Li4] M. Lipponen: On primitive solutions of the Post CorrespondenceProblem, Turku Centre for Computer Science (TUCS) Dissertations,No 1 (1996). http://www.tucs.abo. /publications/dissertations[LP] M. Lipponen, Gh. Paun: Strongly prime PCP words, Discrete Appl.Math. 63 (1995) 193{197.[Ma] G.S. Makanin: The problem of solvability of equations in a free semi-group (in Russian), Mat. Sb. 103 No. 145 (1977) 148{236.[MS1] A. Mateescu, A. Salomaa: PCP-prime words and primality types,RAIRO Inform. Theor. 27 (1993) 57{70.[MS2] A. Mateescu, A. Salomaa: On simplest possible solutions for PostCorrespondence Problems, Acta Inform. 30 (1993) 441{457.[MS3] A. Mateescu, A. Salomaa: Formal languages, an introduction and asynopsis. In G. Rozenberg and A. Salomaa (ed.) Handbook of FormalLanguages, I-III, Springer-Verlag, forthcoming.[MSSY] A. Mateescu, A. Salomaa, K. Salomaa, Sheng Yu: P, NP and PostCorrespondence Problem, Inform. and Control 121 (1995) 135{142.[Post] E. Post: A variant of a recursively unsolvable problem, Bull. Amer.Math. Soc. 53 (1946) 264{268.[RS] G. Rozenberg, A. Salomaa: Cornerstones of Undecidability, PrenticeHall (1994).[Sa1] A. Salomaa: Jewels of Formal Languages, Computer Science Press(1981).[Sa2] A. Salomaa: What Emil said about the Post Correspondence Prob-lem. In G. Rozenberg and A. Salomaa (ed.) Current Trends in The-oretical Computer Science, World Scienti c, Singapore (1993) 563{571.17 [SSY] A. Salomaa, K. Salomaa, Sheng Yu: Primality types of instances ofthe Post Correspondence Problem, EATCS Bull. 44 (1991) 226{241.18 Turku Centre for Computer ScienceLemminkaisenkatu 14FIN-20520 TurkuFinlandhttp://www.tucs.abo.University of TurkuDepartment of Mathematical SciencesAbo Akademi UniversityDepartment of Computer ScienceInstitute for Advanced Management Systems ResearchTurku School of Economics and Business AdministrationInstitute of Information Systems Science

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Binary Equality Sets and a Solution to the Test Set Conjecture in the Binary Case

K. Culik II and J. Karhumaki [RAZRO Inform. Theor. 14 (1980), 349-3691 conjectured that the equality set of two injective morphisms over a binary alphabet is of the form {u, v)* for some (possibly empty) words u and v. Here we show that such an equality set is always either of the above form or of the form (uw*v)* for some words u, W, and v. As an application we give a simple proof for the Test...

متن کامل

Solving Critical Path Problem in Project Network by a New Enhanced Multi-objective Optimization of Simple Ratio Analysis Approach with Interval Type-2 Fuzzy Sets

Decision making is an important issue in business and project management that assists finding the optimal alternative from a number of feasible alternatives. Decision making requires adequate consideration of uncertainty in projects. In this paper, in order to address uncertainty of project environments, interval type-2 fuzzy sets (IT2FSs) are used. In other words, the rating of each alternativ...

متن کامل

A Unique Structure of Two-Generated Binary Equality Sets

Let L be the equality set of two distinct injective morphisms g and h, and let L be generated by at least two words. Recently it was proved ([2]) that such an L is generated by two words and g and h can be chosen marked from both sides. We use this result to show that L is of the form {ab, ba}, with i ≥ 1.

متن کامل

A Study of the Determinant Factors of Education and Age Equality in Marriage (Case Study Yazdian Couples

The purpose of this paper is to study the determinant factors of education and age equality in the marriage. The Sample was 419 subjects of Yazdian couples by the method of cluster sampling. Most respondents were Females and they believed that the equality of marriage and age would cause better situation for marriage.  In other words, the findings showed that there was a positive and meaningful...

متن کامل

Binary Equality Words for Periodic Morphisms

Let g and h be binary morphisms defined on {a, b}∗, and let g be periodic and h nonperiodic. It is well known that their equality language is generated by at most one nonempty word. Suppose |h(b)| ≥ |h(a)|. We show that then the equality word is equal to aba , with i, j ≥ 0. Binary equality sets are the simplest nontrivial equality languages. Nevertheless, their full description is still not kn...

متن کامل

Periodicity Forcing Words

The Dual Post Correspondence Problem asks, for a given word α, if there exists a non-periodic morphism g and an arbitrary morphism h such that g(α) = h(α). Thus α satisfies the Dual PCP if and only if it belongs to a non-trivial equality set. Words which do not satisfy the Dual PCP are called periodicity forcing, and are important to the study of word equations, equality sets and ambiguity of m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bulletin of the EATCS

دوره 60  شماره 

صفحات  -

تاریخ انتشار 1996